NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Online Reinforcement Learning for Diffusion Policy

Ma, Haitong; Chen, Tianyi; Wang, Kai; Li, Na; Dai, Bo (July 2025, PMLR)

Diffusion policies have achieved superior performance in imitation learning and offline reinforcement learning (RL) due to their rich expressiveness. However, the conventional diffusion training procedure requires samples from target distribution, which is impossible in online RL since we cannot sample from the optimal policy. Backpropagating policy gradient through the diffusion process incurs huge computational costs and instability, thus being expensive and not scalable. To enable efficient training of diffusion policies in online RL, we generalize the conventional denoising score matching by reweighting the loss function. The resulting Reweighted Score Matching (RSM) preserves the optimal solution and low computational cost of denoising score matching, while eliminating the need to sample from the target distribution and allowing learning to optimize value functions. We introduce two tractable reweighted loss functions to solve two commonly used policy optimization problems, policy mirror descent and max-entropy policy, resulting in two practical algorithms named Diffusion Policy Mirror Descent (DPMD) and Soft Diffusion Actor-Critic (SDAC). We conducted comprehensive comparisons on MuJoCo benchmarks. The empirical results show that the proposed algorithms outperform recent diffusion-policy online RLs on most tasks, and the DPMD improves more than 120% over soft actor-critic on Humanoid and Ant.
more » « less
Free, publicly-accessible full text available July 13, 2026
Efficient Duple Perturbation Robustness in Low-rank MDPs

Hu, Yang; Ma, Haitong; Li, Na; Dai, Bo (June 2025, PMLR)
Ozay, Necmiye; Balzano, Laura; Panagou, Dimitra; Abate, Alessandro (Ed.)
The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from computation issues that obstruct their real-world implementation. In this paper, we consider MDPs with low-rank structures, where the transition kernel can be written as a linear product of feature map and factors. We introduce *duple perturbation* robustness, i.e. perturbation on both the feature map and the factors, via a novel characterization of (𝜉,𝜂) -ambiguity sets featuring computational efficiency. Our novel low-rank robust MDP formulation is compatible with the low-rank function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Lastly, the robustness of our proposed approach is justified by numerical experiments, including classical control tasks with continuous state-action spaces.
more » « less
Free, publicly-accessible full text available June 4, 2026
Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

https://doi.org/10.1109/IROS58592.2024.10801637

Ma, Haitong; Ren, Zhaolin; Dai, Bo; Li, Na (October 2024, IEEE)

Full Text Available
Multi-Agent Coverage Control with Transient Behavior Consideration

Zhang, Runyu; Ma, Haitong; Li, Na (July 2024, Fodors essential Norway)

Full Text Available
Multi-agent coverage control with transient behavior consideration

Zhang, Runyu; Ma, Haitong; Li, Na (July 2024, Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:1464-1476, 2024.)

Full Text Available
Distributed Thompson Sampling Under Constrained Communication

https://doi.org/10.1109/LCSYS.2024.3525096

Zerefa, Saba; Ren, Zhaolin; Ma, Haitong; Li, Na (January 2024, IEEE Control Systems Letters)

Full Text Available
Gaussian Max-Value Entropy Search for Multi-Agent Bayesian Optimization

https://doi.org/10.1109/IROS55552.2023.10341675

Ma, Haitong; Zhang, Tianpeng; Wu, Yixuan; Calmon, Flavio P.; Li, Na (October 2023, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))

Full Text Available

Search for: All records